Re: UTF-8 question.
От | Dan Sugalski |
---|---|
Тема | Re: UTF-8 question. |
Дата | |
Msg-id | a06110407bd6fe9f9673a@[192.168.1.105] обсуждение исходный текст |
Ответ на | UTF-8 question. ("Richard Connamacher" <rich.n1@indieimage.com>) |
Список | pgsql-general |
At 8:39 PM -0400 9/16/04, Richard Connamacher wrote: >I'm new to PostgreSQL, and from the looks of it, it's a great database, >and I'll be using more of it in the future. > >I had a quick question if anyone could clear this up. The documentation >for PostgreSQL (version 7.1, the version this server is using) says that >it supports multibyte character encodings like Unicode (which implies >UTF-16 encoding). Don't confuse Unicode, the 'character set' and rules for characters, represented by a sequence of abstract 32 bit integers, with UTF-[8|16|32] which is a way to encode those abstract integers into a stream of bytes someplace. > Later on, the same page says that Unicode is >represented using UTF-8 encoding. UTF-8 is the 8-bit version of Unicode. >The multibyte version of Unicode is UTF-16. > >So, which is it? If I create a database using Unicode as the encoding, >will the encoding be UTF-8 (singlebyte) or UTF-16 (multibyte)? Erm... UTF-8 *is* a multibyte encoding. Up to 6 bytes per code point, if things get really degenerate. (And, last I checked, means you can have up to 70 bytes for really degenerate characters, but my memory might be off (could be 80)) UTF-8, UTF-16, and UTF-32 will all encode Unicode characters just fine. -- Dan --------------------------------------it's like this------------------- Dan Sugalski even samurai dan@sidhe.org have teddy bears and even teddy bears get drunk
В списке pgsql-general по дате отправления: